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^ 1 Introduction 

< 

In the Bayesian paradigm for presenting forensic evidence to court, it is 
recommended that the weight of the evidence be summarized as a likelihood 
ratio (LR) between two opposing hypotheses of how the evidence could have 

Oh been produced. Such LRs are necessarily based on probabilistic models, the 

parameters of which may be uncertain. It has been suggested by some authors 
that the value of the LR, being a function of the model parameters should 
therefore also be considered uncertain and that this uncertainty should be 
communicated to the court. 

,-h In this tutorial, we consider a simple example of a fully Bayesian solution, 

where model uncertainty is integrated out to produce a value for the LR which 

is not uncertain. We show that this solution agrees with common sense. In 

particular, the LR magnitude is a function of the amount of data that is 
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available to estimate the model parameters. 

Bayesian methods are often criticised because of the difficulty of choosing 
appropriate priors, especially when the priors are non-informative. We do not 
deny these difficulties, but the problem is not solved by adopting frequentist 
methods that effectively sweep the prior under the carpet and pretend it does 
^ not exist. In this tutorial we do need to choose a non-informative prior and 

we choose it by examining the effect it has on the end- result. 

We shall reference the following books: E.T. Jaynes, Probability Theory: 
The Logic of Science, Cambridge University Press 2003, which we shall ab- 
breviate as PTLOS; and D.J. Balding, Weight-of-evidence for Forensic DNA 
Profiles, Wiley 2005, abbreviated as WEFDNA. 
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2 Simplified DNA model 

In this tutorial we shall derive the details of how to compute the LR with 
a simplified DNA-like model. The idea is not to provide a recipe that can 
be used in real forensic DNA analysis, but rather to choose a model that 
facilitates better understanding of the basic look and feel of a fully Bayesian 
solution. We need the model to be very simple so that we can perform the 
Bayesian integrals in closed form. More realistic models would require more 
complex methods, which would obscure the primary purpose of this tutorial. 

We suppose that the DNA profile of every individual has K different 
binary loci the state of each of which can be either 1 or 0. Every individual 
is therefore categorized by K binary variables, which gives a total number 
of 2 K statesjj We represent a DNA profile by a vector of the form a = 
(ai, 02, • • • , ax)-, where aj. G {0, 1} represents the state of locus k. 

We assume that given a DNA sample (either recovered at the crime scene 
where it was left by the perpetrator, or obtained from the suspect), the state 
of each locus may be determined without error. 

The main complication is when all suspect and perpetrator loci match, 
that there is a non-zero probability that some person other than the suspect 
could have the same DNA profile. To compute this probability, we need to 
model profile distributions. 

3 Profile distribution model 

Here we define a generative model that is probably about as simple as it 
can be. Again, our goal is just to illustrate the basic principles of a fully 
Bayesian approach to this kind of problem. The goal of this exercise is not 
to reproduce a realistic DNA model — in real population genetics, the models 
are more complex. 

Let the probability that locus A; of a randomly chosen person has state 1 
be qk, and the probability that it has state be 1 — q^. According to this 
model we assume the following independencies: 

• The locus states are independent: knowing the state of locus k for one 
or more individuals, tells us nothing about the states of other loci k'. 

1 In real DNA profiling, there are different locus types, with more complex state spaces. 
For example, STR loci consist of two parts with independent states, one inherited from the 
father and the other from the mother. Each part has 2 or more states, called alleles. DNA 
profiling technology can detect the state of each part, but does not show which comes from 
the mother and which from the father. 



• For each locus k, the binary state for each person is sampled as an iid 
Bernoulli trial with parameter q k . 

We can collect the locus probabilities in the vectoir] q = (qi,q2i ■ ■ ■ ,Qk)- 
We refer to q as the model parameter, which encodes everything there is to 
know (under the above modelling assumptions) about how locus states are 
distributed in the population. The model can be summarized by: 

A 
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P(a|q) = H^(l -g,) 1 - 



fc=i 

which is the probability that a randomly chosen individual has DNA profile 
a = (a 1; a2, . . . ,a^) in a population characterized by the model parameter 
q = (qi, q 2 , . . . , qx)- The complication is that we are not given q. Its value 
has to be inferred from prior assumptions and from data. 

4 Inferring the model parameter 

We do a Bayesian inference for the value of q, by computing a posterior 
distribution. 

4.1 Prior 

As prior for q/., we assign a beta distribution. This choice has a threefold 
motivation: (i) The beta distribution is a conjugate prior for this problem, 
which allows for closed-form Bayesian calculations, (ii) It is commonly used 
in forensic DNA practice, (iii) It is general enough to include various non- 
informative priors, which will be of special interest to us. 

We assign independently for each q k a beta distribution with hyper- 
parameter 7Tfc = (ajcflk), so that: 

K 

P(q\7r) = l[Bet&{q k \a k ,(3 k ) (2) 

fc=i 
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2 Notc that the elements of q usually do not sum to one. These are K independent 
probabilities, not one K-ary categorical distribution. 



where we have defined 7r = (7i"i, 7T2, . . . , ttk)- The normalization constant of 
the beta distribution is given by the beta function, defined as: 

a(a,fl = S^W= f q a -\l - qf- 1 da (4) 

i[a + p) Jo 

where T is the gamma function. 

For the beta distribution to be normalized, we need a>k,/3k > and unless 
stated otherwise, we shall assume this condition holds for all our calculations 
below. In places, we will however consider the limit as a^ = /3k — > 0. When 
we do this, we will follow the advice of PTLOS and complete the whole 
calculation under the assumption «&, (3k > and apply the limit only to the 
final result. 

4.1.1 Non-informative priors 

If we want to use a non-informative prior, we let a = a,k = Pk by symmetry, 
and we can choose some a, for example in the range < a < 1. The case 
a — > is called the Haldane prior, the case a = 0.5 is the Jeffreys prior and 
a = 1 is the Laplace prior. 

The Haldane prior is flat in the sense that the probability density for 
log -^- is uniform, but since this reparametrization of q covers the whole real 
line, this prior is improper. 

The Jeffreys prior is flat in the sense that the probability density for 
arcsin(2g — 1) is uniform between — | and |. 

The Laplace prior is flat in the sense that the probability density for q is 
uniform between and 1. 

As these names show, different workers in probability theory have ar- 
rived at different conclusions about which prior should be used to encode 
non-informativeness about the Bernoulli model parameter. To make our cal- 
culations concrete, we will have to make a definite choice of prior. We shall 
solve this problem in a later section, by examining the effect of the prior on 
the end-result of our calculation. 

4.1.2 Informative prior 

In forensic DNArj it is customary to reparametrize the beta prior as: 



Oik = —Q—Pk , 0k = —Q—i 1 ~ Pk) ( 5 ) 



3 See WEFDNA pp. 63-64. 



where < pk < 1 and < 9 < 1. Here 9 is known as the population struc- 
ture parameter. With this parametrization, Beta^lafc, (3k) has the following 
mean and variance: 

(Qk) = ^ir=Pk {(qk - Pkf) = 9p k (l - Pk) (6) 

Cik + Pk 

For small values of 9, one obtains an informative prior, with a small variance 
and a sharp peak near p&. In the extreme as 9 — > 0, we get a strongly 
informative prior, which will override contributions made by finite data and 
therefore asserts qk = Pk- 

For the case Pk = \ and 9 = ^Vj > §, we recover the above-mentioned 
non-informative priors: Laplace at 9 — |, Jeffreys at 9 = | and in the 
extreme as 9 — > 1, the Haldane prior, which gives maximum weight to the 
data. These effects will be shown below. 



4.2 Database 

We make provision in our calculation to optionally use a database of examples 
to help us infer values for q. Let A = (ai, &2, ■ ■ ■ , &l) be a database of 
DNA profiles for L different individuals, where the profile for individual £ is 
a^ = (au, a2g, ■ ■ ■ , axe) and where a^ G {0, 1} is the binary state of locus k 
of individual £. We assume the DNA profiles in A: 

• have been sampled iid from the same population as the suspect and 
perpetrator and are therefore relevant to inferring the parameter q, 

• but the individuals are distinct from the suspect and the perpetrator. 

Our calculations will allow for the case of the empty database, where 
L = 0. 

4.3 Likelihood 

Because of our independence assumptions in the model, the likelihood for q, 
given the database A is: 

L K 

miql^Il^a-ft) 1 ^ (7) 

1=1 k=l 
K 

= \[e^-~ AL - nk 



qk 
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where Uk = X^=i a ke i s the number of times locus k has state 1 and L — nk 
is the number of times it has state 0. 



4.4 Posterior 

We can now infer the value of q by computing the posterior: 

P(a\A n) ~ P(q|?r)P(A|q) (9) 

_fl qt k+nk -\l ~ q k ) h+L - n *- 1 m 

K 

= Y[~Beta(q k \a k + n k , [3 k + L - n k ) (11) 

fc=i 

where the integral in the denominator was solved by inspection, by recogniz- 
ing the numerator as another beta distribution. This is due to the fact that 
the beta distribution is conjugate to the Bernoulli likelihood and therefore 
should result in a beta posterior. Notice that if the database is empty, then 
n k = L = and the posterior is just the prior. 

The prior parameters a k and (5 k play the same roles mathematically as 
the event counts n k and L — n k and are consequently referred to as pseudo- 
counts. The total pseudo count, a + /3 can be interpreted as the size of some 
pseudo database, which is then effectively pooled with A by the additions 



in (11) 



In the alternative prior parametrization, ^-p k and ^p(l — p k ) are the 
pseudo counts and ^p is the size of the pseudo database. 

The posterior P(q| A, n) represents our total state of knowledge about q 
and can be used in all calculations in place of the unknown q. 

5 Forensic LR 

We are given two DNA profiles: One for the suspect, s = (si, s 2) • • • , Sk) an d 
one for the perpetrator, r = (r 1; r 2 , . . . , r^). We work with two hypotheses 
and assume they are the only possible explanations for the observed data 
s,r: 

• The prosecution hypothesis, H p , asserts that suspect and perpetrator 
are the same person. 

• The defence hypothesis H^, asserts that they are different individuals. 

Below we compute the likelihoods under each hypothesis. For now, we assume 
that if they don't match, r/s, then in the absence of DNA measurement 
errors, this proves deductively that Hd is true and H p is false. 
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In the matched case, r = s, however, we need probabilistic reasoning. 
The most natural way to do this would be to compute the posterior, 

P(H p \r, s, A, 7T, n) = 1 - P(H d \r, s, A, n, II) (12) 

where we have introduced the prior for the prosecution hypothesis, 

U = P(H p \U) = l-P(H d \U) (13) 

which is assigned by a reasoning process not involving DNA profiles. How- 
ever, in the Bayesian paradigm for presenting evidence in court one equiva- 
lently considers the posterior odds for H p against H d , which can be separatecjj 
into two factors: likelihood ratio and prior odds, respectively representing the 
contributions of the DNA analysis and all other evidence not related to DNA: 

P(H p \r,s,A,7r,U) _ n 

P(H d \r,s,A,7T,U) 1-n l ' 

where 

P(r,s\H p ,A,7z) 
LR " P(r,s\H d ,A,n) (15) 

is referred to as the likelihood ratio. It is then recommended that the end- 
goal of the forensic DNA analysis is to compute LR, which can be done 
independently of II. We derive expressions for both likelihoods below and 
then form the ratio. 

Finally, notice that if LR = 1, then the DNA analysis is completely non- 
informative about H p versus H d : in this case the posterior (odds) is the same 
as the prior (odds). 

5.1 Prosecution likelihood 

Under the prosecution hypothesis, r and s come from the same individual, 
so that P(r,s\H p ,q) = 5(r, s)P(s|q), where S(r, r) = 1, or <5(r,s) = if 
r/s. Since we are not given q, but instead we are given the prior -k and 
the database A, we must condition on what we have and instead compute: 

P(r, s\H p , 77, A) = 6{r, s)P(s|tt, A) (16) 



4 In the real world, this simple factorization applies only in a limited number of cases. 
If different alternative culprits, with different levels of relatedness to the suspect are con- 
sidered, somewhat more general formulas have to be used, as explained in WEFDNA. 



where 

-1 pi pi 



P(s|tt,A) = f f ■■■ f P(s\q)P(q\ir,A)dq 1 dq 2 ---dq K (17) 

Jo Jo Jo 

P(s|q)P(q|7r,A)dq (18) 



where Q is short-hand for the i^-cube over which we are integrating. Note 
P(s\tt, A) is called the predictive distribution for s, because it predicts the 
value of an as yet unseen profile, given that we have already seen the profiles 
in A. Again by virtue of the conjugate prior, the predictive distribution can 
be found in closed form: 

P(s|tt,A) = / P(s|q)P(q|7T,A)dq (19) 

Jq 

IJl Jo B(a k + n k ,/3 k + L- n k ) 

_ A Jo' q a k k+Sk+nk -\l - qtft+L+i-st-n*-! dqk 

11 RL,. 4--n,. R,_ -I- T, — nA ^ ' 



K 



B(a k + n k , /3 k + L - n k ) 



_ t-t B(a k + s k + n k} /3 k + L + I - s k - n k ) 
fe = i B(a k + n k ,fi k + L-n k ) 

K 

= \\P(s k \iT k ,n k ,L) (23) 

fe=i 

Now we can expand the beta functions in terms of gamma functions and 
simplify the ratios of gammas with the identity T(x + 1) = xT(x), to find the 
predictive probabilityjj 

P(s k = 1K, n k ,L)= a " + R n " (24) 

ot k + p k + L 

(1 - 6)p k + 6n k 



(i-e) + eL 

For the informative prior case, notice that 6 gives interpolation weights be- 
tween data and the prior parameter p k . At the one extreme if 6* — ?- 1 (Haldane 
prior), we disregard the prior parameter p k and end up with just the data 
proportion ^. At the other extreme if 9 = 0, we disregard the data A and 
end up with the prior parameter p k . (If we use the non- informative Laplace 



5 Notice (25 1 agrees with equation 5.6 on page 64 in WEFDNA. 
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prior, with a k = (3 k = 1, then (24) is known as Laplace's rule of succession.) 
Finally, the predictive probability j for the event s k = is: 

P(s k = 0\7i k , n k , L) = — — -— (26) 

otk + Pk + L 

_ (I - 9)(l - &) + 6(L - nj 



(l-9) + 9L 

Note that even for an empty database (L = n k = 0), our assumption a k ,Pk > 
guarantees non-zero predictive probabilities. 

5.2 Defence likelihood 

Under the defence hypothesis, r and s come from different individuals and 
their probabilities are independent given q, so that P(r, s|q) = P(r|q)P(s|q). 
However, q is not given, so the independence no longer holds: knowledge of 
one profile changes the probability for q, which in turn changes the proba- 
bility for the other profile. This dependency is automatically taken care of 
by applying the rules of probability theory by integrating out the unknown 
q: 

P(r, s\H d , 7T, A) = / P(r|q)P(s|q)P(q|7T, A) dq (28) 

Jq 

K 



-|-r B(a k + s k + r k + n k ,p k + L + 2-r k -s k - n k ) 
;JJj B(a k + n k ,(3 k + L- n k ) 



(29) 



K 



= U P(r k , s k \n k , n k , L) (30) 

fc=i 

where we can expand and simplify again to find the predictive probability: 

pi 11 t\ a k + n k a k + n k + l 

P(r k = s k = 1 7T fc , n k , L) = - 31 

ctk + Pk + L ctk + Pk + L + 1 

= P(r k = l\7i k ,n k ,L)P(s k = l\ir k ,n k + 1, L + 1) 

(32) 

Notice the similarity between the two factors in the RHS: the right fac- 
tor is obtained from the left by adding l's to the observation counts. No- 
tice also that if a + n k ^> 1, then P(r k = s k = l\n k ,n k ,L) w P(r k = 
l\n k ,n k ,L)P(s k = l\ir k ,n k ,L), making the two events almost independent. 

6 Notice P(s k = 0\a k ,/3 k ,n k ,L) + P(s k = l\a k , Pk,n k ,L) = 1. 



The probability for the other event of interest^ is obtained similarly as: 

P{r k = s k = 0\n k , n k , L) = P(r k = 0\n k , n k , L)P(s k = 0\ir k , n k , L+l) (33) 

5.3 LR 

Forming the likelihood-ratio, we find: 



P(r,s\H d ,ir,A) 



fc=i 



where 



LR t (r,s) = S(r p f ( ^' nt n L) (35) 

P(r, s\TT k ,n k ,L) 



S(r,s)P(s\n k ,n k ,L) 
P(s,s\n k ,n k ,L) 

5(r,s)P(s\ir k ,n k ,L) 

P(s\iTk, n k , L)P(s\7r k , n k + s,L + 1) 
S(r, s) 



(36) 
(37) 
(38) 



P(s\-K k ,n k + s,L + 1) 
More explicitly, for the mismatched cases we have 

LR fc (0,l)=LR fc (l,0) = (39) 

and for the matched cases we have 

at + Bu + L + 1 . . a if + B k + L + 1 . . 

LR fc (1, 1 = ™ — , LR* 0, = k 7 40 

or, with the other prior parametrization: 

tr n n- a-^ + ^ + i) 

LR * (1>1) -(l-(9)p fc + 6l(n fc + l) (41) 

and 

,„,„,_ (l-g) + fl(£ + l) 

Notice again, that interpolates between data and the prior parameter p k . 
The minimum value (for the matched case r k = s k ) is 1. This is a consequence 
of the error-free measurement assumption. If non-zero error probabilities 
were considered, values of less than 1 would be possible. 



7 We don't need the events (0, 1) and (1,0) here, because we are interested in the case 
where profiles match. 
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6 Plug-in recipe 

In this section, we shall refer to: 

• One or more reference populations, from which one or more databases 
are drawn to help to estimate the parameters pk and 9 for an informa- 
tive prior. 

• The relevant population, from which the suspect and perpetrator were 
drawn. 

In the general case, all these populations are assumed different from each 
other in the sense that locus state frequencies may differ between them. 

For forensic DNA applications, WEFDNA motivates a plug-in recipe to 
compute the LR, where values for 9 and the pk are point-estimates made from 
one or more reference databases. In this recipe, the pt are representative of 
the frequencies in the reference populations, while the value of 9 is chosen to 
reflect by how much the corresponding frequencies in the relevant population 
may differ. Small values of 9 encode small expected differences and larger 
values encode larger expected differences. WEFDNA motivates for values in 
the range 1% < 9 < 5% to be used for most applications. 



Our database A, as defined in section 4J2 is assumed to be drawn from 
the relevant population, but in the usual forensic scenario, additional profiles 
from the relevant population are not available. In our notation, this means 
A is empty. 

In summary, in the WEFDNA plug-in recipe we set L = nk = 0, the pk 
are generally different from | and 9 is smallish. This forms an informative 
prior for the q%. This gives, for r^ = Sk'- 

7 Fully Bayesian recipe 

Now we turn to the main purpose of this document, namely to explore a fully 
Bayesian recipe, where we start with a non-informative prior and use only 
the given data, A, r, s, to infer the model parameter. 

It must be emphasized that this fully Bayesian recipe cannot be used as is 
to replace the plug-in recipe, because here we use the luxury of database A, 
sampled from the relevant population. As noted above, in a realistic scenario, 
we do not have this luxury: instead we have to make do with data sampled 
from some other, somewhat different, reference population. Although a fully 
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Bayesian recipe could in principle be derived for this more realistic scenario, 
this would come at the cost of a considerable increase in both conceptual 
difficulties as well as computational complexity. 

In this section, therefore we assume we do have a database, A, sampled 
from the relevant database and the only difficulty that remains is to choose 
the non-informative prior. 

7.1 Which prior? 

We are now faced with making a choice amongst the different flavours of non- 
informative priors. That is, we have to choose ak and /3k, or equivalently pk 
and 9. 

We concede that we are choosing a prior under the perhaps arbitrary 
constraint that it should be a beta distribution. A more thorough motivation 
for the prior should perhaps involve solving functional equations in the style 
of PTLOS. We feel however that the beta distribution already provides a 
rich enough space for the choice of prior. Moreover, as mentioned above, 
the non-informative Haldane, Jeffreys and Laplace priors all members of the 
beta family. 

To start, we motivate the choice a = ak = Pk, or equivalently pk = 
a + s = 2- Before we have seen any data, all loci are on an equal footing, 
so that the priors for all k must be the same. Next consider a database A 
with an equal number of O's and l's for some locus k, so that rik = L — rik- 
In this situation, there is no reason to prefer one state to the other, so 
that the model parameter posterior should satisfy the symmetry condition: 
P{qk\ak,Pk,L,n k ) = P(l - q k \ak,(3k,L,n k ), which is obtained at a k = 0k- 
Another way to see this is simply to require LRfc(0,0) = LRfc(l, 1) when 
Uk = L — rik- 

Now we have Pk = \ and we still need to choose 9. To do this, consider 
the case of the empty database, with L = n k = 0, for which case we still 



want our recipe to give a sensible answer. Now (41) and (42) give 



LR t (l,l) = LR t (0,0) = (i _ e)i+ ^ — (44) 

When A is empty, we now argue that we don't even know whether the locus 
state varies in the population. So we are not justified in concluding that the 
match at the locus modifies the probabilities for H p vs H^. If we maximize 
9 at the limit 9 — > 1, then we obtain the non- informative value of LR^ = 1, 
so that the DNA evidence is effectively disregarded. 
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7.2 Analysis 

Here we analyse the behaviour of LK k (r k , s k ), when r k = s k and = 1. We 
get: 

LR fc (l, 1) = ^-, LR fe (0, 0) = r ^. + 1 (45) 

n k + 1 L + 1 — n k 

We make several observations: 

• The matched likelihood ratios are bounded: 1 < LR k (s,s) < L + 1. 
We have already commented on the lower bound. The upper bound 
is determined by the database size, L. This makes intuitive sense, the 
larger the database, the more our maximum confidence grows. Note 
however, that this maximum should be a relatively rare occurrence, as 
shown below. 

• For an empty database, if L = n k = 0, then as discussed, LR^(1, 1) = 
LR fe (0,0) = l. 

• For a non-empty database, as long as a locus k has the same state in 
all of the observed data, A, r, s, then the LR is still unity: If n k = L, 
then LR fc (l, 1) = 1 and if n k = 0, then LR fc (0, 0) = 1. 

• Conversely, for a given database size L, the maximum LR value is 
reached when the locus state observed in s k = r k has never been ob- 
served in A. This implies the trait shared by the suspect and perpetra- 
tor is rare. The larger the database size, L, the more we are convinced 
of the rarity and the more we are convinced of the identity of suspect 
and perpetrator. 

• For a large database, where both n k ^> 1 and L — n k ^> 1, the likelihood 
ratio for s k = r k is the inverse of the frequency of the corresponding 
event in the database: LR*.(1, 1) rs — and LRfc(0,0) ~ L ^ . 

We can briefly compare this recipe to a very naive recipe, where we simply 
assign q k = ^, irrespective of the size of the database. This would give 
LR(1, 1) = — and LRfc(0,0) = L ^ . This agrees with the last case above of 
the Bayesian recipe, but in any other cases it could give overconfident results. 
In particular, if n k = 0, or n k = L, one could get infinite LR values, which 
would be ridiculous in the extreme if L = 1. The fully Bayesian recipe agrees 
with the naive recipe when data is plentiful, but continues to give sensible 
answers even when the data gets scarce to the point of vanishing. 
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7.2.1 Comment on Haldane prior 

With a more realistic DNA model, where each STR locus has two indepen- 
dent sides (paternal and maternal), we can gain some extra insight into the 
nature of the Haldane prior. In this case, it can be shown (WEFDNA, sec- 
tion 6.2.2) that when L = 0, the LR for a locus can nevertheless reach a 
maximum of 3. If the paternal and maternal sides are the same, then we get 
LR=1, but if they are different, we get LR=3. From this fact and the third 
bullet above, we learn that: 

The LR at locus k becomes non-informative (LR^ = 1) under the 
Haldane prior, if and only if no state change has been observed 
at locus k in all of the data, A, r, s. 

One may argue that loci used for forensic DNA profiling have been chosen 
for the purpose of giving good discrimination between individuals, precisely 
because they do vary appreciably between individuals and that therefore 
the Haldane prior is too extreme. However, we are concerned here with sub- 
populations, about which we cannot assume that every locus is informative — 
it may well be that a certain locus is constant over the whole sub-population. 
We therefore argue that the behaviour of the Haldane prior is appropriate: 
the LR for a locus remains non- informative (LR^ = 1), until we have observed 
at least one state change in our data. 
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